-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support hashing vectorizer inside a union #176
Conversation
Refactor, start a test
docs/source/libraries/sklearn.rst
Outdated
@@ -219,6 +219,17 @@ automatically; to handle HashingVectorizer_ and FeatureHasher_ for | |||
# and ``ivec`` can be used as a vectorizer for eli5.explain_weights: | |||
eli5.explain_weights(clf, vec=ivec) | |||
|
|||
HashingVectorizer_ is also supported inside a FeatureUnion_: | |||
:func:`eli5.explain_prediction` handles this case automatically, and for | |||
:func:`eli5.explain_weights` you can use :func:`eli5.sklearn.invert_and_fit`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an extra ` at the end of the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks for spotting it, it's so tiny! Fixed in 94d81e9
Without this fix sklearn gives an error in sklearn/feature_extraction/hashing.py, line 142: TypeError: Expected bytes, got numpy.string_
Codecov Report
@@ Coverage Diff @@
## master #176 +/- ##
==========================================
+ Coverage 97.25% 97.34% +0.09%
==========================================
Files 39 39
Lines 2405 2450 +45
Branches 452 464 +12
==========================================
+ Hits 2339 2385 +46
Misses 34 34
+ Partials 32 31 -1
|
@kmike this is ready for review, could you please check it again? I updated the PR description and added a py2 fix. This feature could be less complicated if InvertableHashingVectorizer would return feature names as strings, not as lists of {'name': , 'sign': } dicts - in this case we would not need code that is currently in |
The main concert I have here is the name of the |
eli5/sklearn/unhashing.py
Outdated
""" Create an InvertableHashingVectorizer from hashing vectorizer vec | ||
and fit it on docs. If vec is a FeatureUnion, do it for all | ||
hashing vectorizers in the union. | ||
Returns an InvertableHashingVectorizer, or a Union, or an unchanged vectorizer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Union -> FeatureUnion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, fixed in b89d0f3. Thanks!
Yeah, I think it should be named either |
Thanks @kmike for suggestion! Also fix docstring: Union -> FeatureUnion
Yes, |
Looks good, thanks @lopuhin! |
Thanks for review @kmike ! |
Also add an
eli5.sklearn.invert_and_fit
helper that inverts and fits a vectorizer (even if it's hiding in a union). The fix in 85c8a61 is not strictly related to this PR, I think it's an old issue revealed by new tests.My primary motivation was deep-deep model and predictions explanation support.
Fixes #16